OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition

J Theor Biol. 2017 Feb 7:414:128-136. doi: 10.1016/j.jtbi.2016.11.028. Epub 2016 Dec 2.

Abstract

Regarding to critical roles of oogenesis in formation of ova or unfertilized eggs from the oogonia by mitotic division and subsequent differentiation, the identification of oogenesis-related proteins is of great interest. However, the experimental determination of proteins involved in oogenesis is expensive, time consuming and labor-intensive. Therefore, a new powerful discriminating model is indispensable for classifying oogenesis/non-oogenesis-related proteins with high accuracy and precision. Hereby, for the first time we developed a support vector machine based oogenesis protein prediction method which differentiates oogenesis from non-oogenesis proteins. By means of informative protein physicochemical properties and in addition parameter optimization scheme, our method yields a robust and consistent performance. Our model achieved 87.68% and 84.82% prediction accuracy by five-fold cross validation test for datasets with 90% and 50% identity, respectively. The prediction model was also assessed using the independent dataset and yielded 91.62% and 85.38% prediction accuracy for datasets with 90% and 50% identity, respectively, which further demonstrates the effectiveness of our method. Moreover, by applying 10 different feature weighting methods, the more important protein features for oogenesis/non-oogenesis-related proteins discrimination, including serine and glycine frequency, quasi-sequence-order, pseudo-amino acid composition, distribution and conjoint triad, were determined. The success rates revealed that our model can be considered as a new encouraging and strong model for predicting proteins involved in oogenesis with appropriate performance. To enhance the value of the practical applications of the proposed method, we developed a standalone software for predicting oogenesis candidate proteins called OOgenesis_Pred. This software is the first predictor ever established for identifying oogenesis proteins. We also showed the capability of OOgenesis_Pred by making oogenesis-related proteins prediction for some of the oogenesis candidate proteins. It is anticipated that OOgenesis_Pred will become a powerful tool for future proteomic studies related to oogenesis.

Keywords: Data mining; Feature selection; Oogenesis; Protein features; SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Cycle Proteins* / genetics
  • Cell Cycle Proteins* / metabolism
  • Egg Proteins* / genetics
  • Egg Proteins* / metabolism
  • Female
  • Humans
  • Meiosis / physiology*
  • Oogenesis / physiology*
  • Oogonia / metabolism*
  • Predictive Value of Tests
  • Sequence Analysis, Protein

Substances

  • Cell Cycle Proteins
  • Egg Proteins